NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

riboviz 2: a flexible and robust ribosome profiling data analysis and visualization workflow

https://doi.org/10.1093/bioinformatics/btac093

Cope, Alexander L; Anderson, Felicity; Favate, John; Jackson, Michael; Mok, Amanda; Kurowska, Anna; Liu, Junchen; MacKenzie, Emma; Shivakumar, Vikram; Tilton, Peter; et al (February 2022, Bioinformatics)
Valencia, Alfonso (Ed.)
Abstract Motivation Ribosome profiling, or Ribo-seq, is the state-of-the-art method for quantifying protein synthesis in living cells. Computational analysis of Ribo-seq data remains challenging due to the complexity of the procedure, as well as variations introduced for specific organisms or specialized analyses. Results We present riboviz 2, an updated riboviz package, for the comprehensive transcript-centric analysis and visualization of Ribo-seq data. riboviz 2 includes an analysis workflow built on the Nextflow workflow management system for end-to-end processing of Ribo-seq data. riboviz 2 has been extensively tested on diverse species and library preparation strategies, including multiplexed samples. riboviz 2 is flexible and uses open, documented file formats, allowing users to integrate new analyses with the pipeline. Availability and implementation riboviz 2 is freely available at github.com/riboviz/riboviz.
more » « less
Full Text Available
Liftoff: accurate mapping of gene annotations

https://doi.org/10.1093/bioinformatics/btaa1016

Shumate, Alaina; Salzberg, Steven L (May 2021, Bioinformatics)
Valencia, Alfonso (Ed.)
Abstract Motivation Improvements in DNA sequencing technology and computational methods have led to a substantial increase in the creation of high-quality genome assemblies of many species. To understand the biology of these genomes, annotation of gene features and other functional elements is essential; however, for most species, only the reference genome is well-annotated. Results One strategy to annotate new or improved genome assemblies is to map or ‘lift over’ the genes from a previously annotated reference genome. Here, we describe Liftoff, a new genome annotation lift-over tool capable of mapping genes between two assemblies of the same or closely related species. Liftoff aligns genes from a reference genome to a target genome and finds the mapping that maximizes sequence identity while preserving the structure of each exon, transcript and gene. We show that Liftoff can accurately map 99.9% of genes between two versions of the human reference genome with an average sequence identity >99.9%. We also show that Liftoff can map genes across species by successfully lifting over 98.3% of human protein-coding genes to a chimpanzee genome assembly with 98.2% sequence identity. Availability and implementation Liftoff can be installed via bioconda and PyPI. In addition, the source code for Liftoff is available at https://github.com/agshumate/Liftoff. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
Protein contact map refinement for improving structure prediction using generative adversarial networks

https://doi.org/10.1093/bioinformatics/btab220

Maddhuri Venkata Subramaniya, Sai Raghavendra; Terashi, Genki; Jain, Aashish; Kagaya, Yuki; Kihara, Daisuke (March 2021, Bioinformatics)
Valencia, Alfonso (Ed.)
Abstract Motivation Protein structure prediction remains as one of the most important problems in computational biology and biophysics. In the past few years, protein residue–residue contact prediction has undergone substantial improvement, which has made it a critical driving force for successful protein structure prediction. Boosting the accuracy of contact predictions has, therefore, become the forefront of protein structure prediction. Results We show a novel contact map refinement method, ContactGAN, which uses Generative Adversarial Networks (GAN). ContactGAN was able to make a significant improvement over predictions made by recent contact prediction methods when tested on three datasets including protein structure modeling targets in CASP13 and CASP14. We show improvement of precision in contact prediction, which translated into improvement in the accuracy of protein tertiary structure models. On the other hand, observed improvement over trRosetta was relatively small, reasons for which are discussed. ContactGAN will be a valuable addition in the structure prediction pipeline to achieve an extra gain in contact prediction accuracy. Availability and implementation https://github.com/kiharalab/ContactGAN. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
MixTwice: large-scale hypothesis testing for peptide arrays by variance mixing

https://doi.org/10.1093/bioinformatics/btab162

Zheng, Zihao; Mergaert, Aisha M; Ong, Irene M; Shelef, Miriam A; Newton, Michael A (March 2021, Bioinformatics)
Valencia, Alfonso (Ed.)
Abstract Summary Peptide microarrays have emerged as a powerful technology in immunoproteomics as they provide a tool to measure the abundance of different antibodies in patient serum samples. The high dimensionality and small sample size of many experiments challenge conventional statistical approaches, including those aiming to control the false discovery rate (FDR). Motivated by limitations in reproducibility and power of current methods, we advance an empirical Bayesian tool that computes local FDR statistics and local false sign rate statistics when provided with data on estimated effects and estimated standard errors from all the measured peptides. As the name suggests, the MixTwice tool involves the estimation of two mixing distributions, one on underlying effects and one on underlying variance parameters. Constrained optimization techniques provide for model fitting of mixing distributions under weak shape constraints (unimodality of the effect distribution). Numerical experiments show that MixTwice can accurately estimate generative parameters and powerfully identify non-null peptides. In a peptide array study of rheumatoid arthritis, MixTwice recovers meaningful peptide markers in one case where the signal is weak, and has strong reproducibility properties in one case where the signal is strong. Availabilityand implementation MixTwice is available as an R software package https://cran.r-project.org/web/packages/MixTwice/. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
CLUE: exact maximal reduction of kinetic models by constrained lumping of differential equations

https://doi.org/10.1093/bioinformatics/btab010

Ovchinnikov, Alexey; Pérez Verona, Isabel; Pogudin, Gleb; Tribastone, Mirco (February 2021, Bioinformatics)
Valencia, Alfonso (Ed.)
Abstract Motivation Detailed mechanistic models of biological processes can pose significant challenges for analysis and parameter estimations due to the large number of equations used to track the dynamics of all distinct configurations in which each involved biochemical species can be found. Model reduction can help tame such complexity by providing a lower-dimensional model in which each macro-variable can be directly related to the original variables. Results We present CLUE, an algorithm for exact model reduction of systems of polynomial differential equations by constrained linear lumping. It computes the smallest dimensional reduction as a linear mapping of the state space such that the reduced model preserves the dynamics of user-specified linear combinations of the original variables. Even though CLUE works with non-linear differential equations, it is based on linear algebra tools, which makes it applicable to high-dimensional models. Using case studies from the literature, we show how CLUE can substantially lower model dimensionality and help extract biologically intelligible insights from the reduction. Availabilityand implementation An implementation of the algorithm and relevant resources to replicate the experiments herein reported are freely available for download at https://github.com/pogudingleb/CLUE. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
Impact of lossy compression of nanopore raw signal data on basecalling and consensus accuracy

https://doi.org/10.1093/bioinformatics/btaa1017

Chandak, Shubham; Tatwawadi, Kedar; Sridhar, Srivatsan; Weissman, Tsachy (December 2020, Bioinformatics)
Valencia, Alfonso (Ed.)
Abstract Motivation Nanopore sequencing provides a real-time and portable solution to genomic sequencing, enabling better assembly, structural variant discovery and modified base detection than second generation technologies. The sequencing process generates a huge amount of data in the form of raw signal contained in fast5 files, which must be compressed to enable efficient storage and transfer. Since the raw data is inherently noisy, lossy compression has potential to significantly reduce space requirements without adversely impacting performance of downstream applications. Results We explore the use of lossy compression for nanopore raw data using two state-of-the-art lossy time-series compressors, and evaluate the tradeoff between compressed size and basecalling/consensus accuracy. We test several basecallers and consensus tools on a variety of datasets at varying depths of coverage, and conclude that lossy compression can provide 35–50% further reduction in compressed size of raw data over the state-of-the-art lossless compressor with negligible impact on basecalling accuracy (≲0.2% reduction) and consensus accuracy (≲0.002% reduction). In addition, we evaluate the impact of lossy compression on methylation calling accuracy and observe that this impact is minimal for similar reductions in compressed size, although further evaluation with improved benchmark datasets is required for reaching a definite conclusion. The results suggest the possibility of using lossy compression, potentially on the nanopore sequencing device itself, to achieve significant reductions in storage and transmission costs while preserving the accuracy of downstream applications. Availabilityand implementation The code is available at https://github.com/shubhamchandak94/lossy_compression_evaluation. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
projectR: an R/Bioconductor package for transfer learning via PCA, NMF, correlation and clustering

https://doi.org/10.1093/bioinformatics/btaa183

Sharma, Gaurav; Colantuoni, Carlo; Goff, Loyal A; Fertig, Elana J; Stein-O’Brien, Genevieve (March 2020, Bioinformatics)
Valencia, Alfonso (Ed.)
Abstract Motivation Dimension reduction techniques are widely used to interpret high-dimensional biological data. Features learned from these methods are used to discover both technical artifacts and novel biological phenomena. Such feature discovery is critically importent in analysis of large single-cell datasets, where lack of a ground truth limits validation and interpretation. Transfer learning (TL) can be used to relate the features learned from one source dataset to a new target dataset to perform biologically driven validation by evaluating their use in or association with additional sample annotations in that independent target dataset. Results We developed an R/Bioconductor package, projectR, to perform TL for analyses of genomics data via TL of clustering, correlation and factorization methods. We then demonstrate the utility TL for integrated data analysis with an example for spatial single-cell analysis. Availability and implementation projectR is available on Bioconductor and at https://github.com/genesofeve/projectR. Contact gsteinobrien@jhmi.edu or ejfertig@jhmi.edu Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available

Search for: All records